AITopics | fenchel-young loss

Collaborating Authors

fenchel-young loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems

Neural Information Processing SystemsJun-22-2026, 06:41:34 GMT

Learning representations for solutions of constrained optimization problems (COPs) with unknown cost functions is challenging, as models like (Variational) Autoencoders struggle to enforce constraints when decoding structured outputs. We propose an Inverse Optimization Latent Variable Model (IO-LVM) that learns a latent space of COP cost functions from observed solutions and reconstructs feasible outputs by solving a COP with a solver in the loop. Our approach leverages estimated gradients of a Fenchel-Young loss through a non-differentiable deterministic solver to shape the latent space. Unlike standard Inverse Optimization or Inverse Reinforcement Learning methods, which typically recover a single or context-specific cost function, IO-LVM captures a distribution over cost functions, enabling the identification of diverse solution behaviors arising from different agents or conditions not available during the training process. We validate our method on real-world datasets of ship and taxi routes, as well as paths in synthetic graphs, demonstrating its ability to reconstruct paths and cycles, predict their distributions, and yield interpretable latent representations.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

Any-stepsize Gradient Descent for Separable Data under Fenchel-Young Losses

Neural Information Processing SystemsJun-18-2026, 02:49:32 GMT

The gradient descent (GD) has been one of the most common optimizer in machine learning. In particular, the loss landscape of a neural network is typically sharpened during the initial phase of training, making the training dynamics hover on the edge of stability. This is beyond our standard understanding of GD convergence in the stable regime where stepsize is chosen sufficiently smaller. Recently, Wu et al. [63] have shown that GD converges with much larger stepsize under linearly separable logistic regression. Although their analysis hinges on the self-bounding property of the logistic loss, which seems to be a cornerstone to establish a modified descent lemma, our pilot study shows that other loss functions without the selfbounding property can make GD attain arbitrarily small loss with large stepsize.

artificial intelligence, machine learning, separation margin, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses

Neural Information Processing SystemsJun-16-2026, 01:23:25 GMT

Surrogate regret bounds, also known as excess risk bounds, bridge the gap between the convergence rates of surrogate and target losses. The regret transfer is lossless if the surrogate regret bound is linear. While convex smooth surrogate losses are appealing in particular due to the efficient estimation and optimization, the existence of a trade-off between the loss smoothness and linear regret bound has been believed in the community. Under this scenario, the better optimization and estimation properties of convex smooth surrogate losses may inevitably deteriorate after undergoing the regret transfer onto a target loss. We overcome this dilemma for arbitrary discrete target losses by constructing a convex smooth surrogate loss, which entails a linear surrogate regret bound composed with a tailored prediction link. The construction is based on Fenchel-Young losses generated by the convolutional negentropy, which are equivalent to the infimal convolution of a generalized negentropy and the target Bayes risk. Consequently, the infimal convolution enables us to derive a smooth loss while maintaining the surrogate regret bound linear. We additionally benefit from the infimal convolution to have a consistent estimator of the underlying class probability. Our results are overall a novel demonstration of how convex analysis penetrates into optimization and statistical efficiency in risk minimization.

artificial intelligence, fenchel-young loss, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Learning with Fitzpatrick Losses

Neural Information Processing SystemsMar-21-2026, 14:57:28 GMT

Fenchel-Young losses are a family of loss functions, encompassing the squared,logistic and sparsemax losses, among others. They are convex w.r.t. the modeloutput and the target, separately. Each Fenchel-Young loss is implicitly associatedwith a link function, that maps model outputs to predictions. For instance, thelogistic loss is associated with the soft argmax link function. Can we build newloss functions associated with the same link function as Fenchel-Young losses?In this paper, we introduce Fitzpatrick losses, a new family of separately convexloss functions based on the Fitzpatrick function. A well-known theoretical tool inmaximal monotone operator theory, the Fitzpatrick function naturally leads to arefined Fenchel-Young inequality, making Fitzpatrick losses tighter than Fenchel-Young losses, while maintaining the same link function for prediction. As anexample, we introduce the Fitzpatrick logistic loss and the Fitzpatrick sparsemaxloss, counterparts of the logistic and the sparsemax losses. This yields two newtighter losses associated with the soft argmax and the sparse argmax, two of themost ubiquitous output layers used in machine learning. We study in details theproperties of Fitzpatrick losses and, in particular, we show that they can be seen asFenchel-Young losses using a modified, target-dependent generating function.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

7990ec44fcf3d7a0e5a2add28362213c-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 16:21:52 GMT

We propose in this paper a general framework for deriving loss functions for structured prediction. Inourframework,theuserchooses aconvexsetincluding the output space and provides an oracle forprojectingonto that set.

artificial intelligence, inproc, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Non-Stationary Online Structured Prediction with Surrogate Losses

Sakaue, Shinsaku, Bao, Han, Cao, Yuzhou

arXiv.org Artificial IntelligenceOct-9-2025

Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.07086

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Learning with Fitzpatrick Losses

Neural Information Processing SystemsMay-27-2025, 08:52:48 GMT

fenchel-young loss, fitzpatrick loss, link function, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses

Andrade, Francisco, Peyré, Gabriel, Poon, Clarice

arXiv.org Machine LearningMay-13-2025

Estimating parameters from samples of an optimal probability distribution is essential in applications ranging from socio-economic modeling to biological system analysis. In these settings, the probability distribution arises as the solution to an optimization problem that captures either static interactions among agents or the dynamic evolution of a system over time. Our approach relies on minimizing a new class of loss functions, called sharpened Fenchel-Young losses, which measure the sub-optimality gap of the optimization problem over the space of measures. We study the stability of this estimation method when only a finite number of sample is available. The parameters to be estimated typically correspond to a cost function in static problems and to a potential function in dynamic problems. To analyze stability, we introduce a general methodology that leverages the strong convexity of the loss function together with the sample complexity of the forward optimization problem. Our analysis emphasizes two specific settings in the context of optimal transport, where our method provides explicit stability guarantees: The first is inverse unbalanced optimal transport (iUOT) with entropic regularization, where the parameters to estimate are cost functions that govern transport computations; this method has applications such as link prediction in machine learning. The second is inverse gradient flow (iJKO), where the objective is to recover a potential function that drives the evolution of a probability distribution via the Jordan-Kinderlehrer-Otto (JKO) time-discretization scheme; this is particularly relevant for understanding cell population dynamics in single-cell genomics. Finally, we validate our approach through numerical experiments on Gaussian distributions, where closed-form solutions are available, to demonstrate the practical performance of our methods

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2505.07124

Country:

Asia > Middle East > Jordan (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.49)

Industry:

Energy (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Primal-dual algorithm for contextual stochastic combinatorial optimization

Bouvier, Louis, Prunet, Thibault, Leclère, Vincent, Parmentier, Axel

arXiv.org Artificial IntelligenceMay-9-2025

This paper introduces a novel approach to contextual stochastic optimization, integrating operations research and machine learning to address decision-making under uncertainty. Traditional methods often fail to leverage contextual information, which underscores the necessity for new algorithms. In this study, we utilize neural networks with combinatorial optimization layers to encode policies. Our goal is to minimize the empirical risk, which is estimated from past data on uncertain parameters and contexts. To that end, we present a surrogate learning problem and a generic primal-dual algorithm that is applicable to various combinatorial settings in stochastic optimization. Our approach extends classic Fenchel-Young loss results and introduces a new regularization method using sparse perturbations on the distribution simplex. This allows for tractable updates in the original space and can accommodate diverse objective functions. We demonstrate the linear convergence of our algorithm under certain conditions and provide a bound on the non-optimality of the resulting policy in terms of the empirical risk. Experiments on a contextual stochastic minimum weight spanning tree problem show that our algorithm is efficient and scalable, achieving performance comparable to imitation learning of solutions computed using an expensive Lagrangian-based heuristic.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.04757

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Towards Understanding the Optimization Mechanisms in Deep Learning

Qi, Binchuan, Gong, Wei, Li, Li

arXiv.org Artificial IntelligenceMar-29-2025

Key insights from the studies Arjevani and Field (2022); Chizat, Oyallon, and Bach (2018); Du, Zhai, P oczos, and Singh (2018); Yun, Sra, and Jadbabaie (2018) emphasize the pivotal role of over-parameterization in finding the global optimum and enhancing the generalization ability of deep neural networks (DNNs). Recent work has shown that the evolution of the trainable parameters in continuous-width DNNs during training can be captured by the neural tangent kernel (NTK) Arora, Du, Hu, Li, and Wang (2019); Du, Lee, Li, Wang, and Zhai (2018); Jacot, Gabriel, and Hongler (2018); Mohamadi, Bae, and Sutherland (2023); Wang, Li, and Sun (2023); Zou, Cao, Zhou, and Gu (2018). An alternative research direction attempts to examine the infinite-width neural network from a mean-field perspective (Chizat & Bach, 2018; Mei, Montanari, & Nguyen, 2018; Nguyen & Pham, 2023; Sirignano & Spiliopoulos, 2018). However, in practical applications, neural networks are of finite width, and under this condition, it remains unclear whether NTK theory and mean-field theory can adequately characterize the convergence properties of neural networks Seleznova and Kutyniok (2021). Therefore, the mechanisms of non-convex optimization in deep learning, and the impact of over-parameterization on model training, remain incompletely resolved.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2503.23016

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback